Biblioteca Digital

14 resultados para Chemometrics, Data pretreatment, variate calibration, variate curve resolution

em Duke University

Structuring the Calibration of Qualitative Data as Sets for Qualitative Comparative Analysis (QCA)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Most studies that apply qualitative comparative analysis (QCA) rely on macro-level data, but an increasing number of studies focus on units of analysis at the micro or meso level (i.e., households, firms, protected areas, communities, or local governments). For such studies, qualitative interview data are often the primary source of information. Yet, so far no procedure is available describing how to calibrate qualitative data as fuzzy sets. The authors propose a technique to do so and illustrate it using examples from a study of Guatemalan local governments. By spelling out the details of this important analytic step, the authors aim at contributing to the growing literature on best practice in QCA. © The Author(s) 2012.

Veja mais

Improved delineation of short cortical association fibers and gray/white matter boundary using whole-brain three-dimensional diffusion tensor imaging at submillimeter spatial resolution.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recent emergence of human connectome imaging has led to a high demand on angular and spatial resolutions for diffusion magnetic resonance imaging (MRI). While there have been significant growths in high angular resolution diffusion imaging, the improvement in spatial resolution is still limited due to a number of technical challenges, such as the low signal-to-noise ratio and high motion artifacts. As a result, the benefit of a high spatial resolution in the whole-brain connectome imaging has not been fully evaluated in vivo. In this brief report, the impact of spatial resolution was assessed in a newly acquired whole-brain three-dimensional diffusion tensor imaging data set with an isotropic spatial resolution of 0.85 mm. It was found that the delineation of short cortical association fibers is drastically improved as well as the definition of fiber pathway endings into the gray/white matter boundary-both of which will help construct a more accurate structural map of the human brain connectome.

Veja mais

Optimized approach to decision fusion of heterogeneous data for breast cancer diagnosis.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

As more diagnostic testing options become available to physicians, it becomes more difficult to combine various types of medical information together in order to optimize the overall diagnosis. To improve diagnostic performance, here we introduce an approach to optimize a decision-fusion technique to combine heterogeneous information, such as from different modalities, feature categories, or institutions. For classifier comparison we used two performance metrics: The receiving operator characteristic (ROC) area under the curve [area under the ROC curve (AUC)] and the normalized partial area under the curve (pAUC). This study used four classifiers: Linear discriminant analysis (LDA), artificial neural network (ANN), and two variants of our decision-fusion technique, AUC-optimized (DF-A) and pAUC-optimized (DF-P) decision fusion. We applied each of these classifiers with 100-fold cross-validation to two heterogeneous breast cancer data sets: One of mass lesion features and a much more challenging one of microcalcification lesion features. For the calcification data set, DF-A outperformed the other classifiers in terms of AUC (p < 0.02) and achieved AUC=0.85 +/- 0.01. The DF-P surpassed the other classifiers in terms of pAUC (p < 0.01) and reached pAUC=0.38 +/- 0.02. For the mass data set, DF-A outperformed both the ANN and the LDA (p < 0.04) and achieved AUC=0.94 +/- 0.01. Although for this data set there were no statistically significant differences among the classifiers' pAUC values (pAUC=0.57 +/- 0.07 to 0.67 +/- 0.05, p > 0.10), the DF-P did significantly improve specificity versus the LDA at both 98% and 100% sensitivity (p < 0.04). In conclusion, decision fusion directly optimized clinically significant performance measures, such as AUC and pAUC, and sometimes outperformed two well-known machine-learning techniques when applied to two different breast cancer data sets.

Veja mais

Nonparametric estimation of structural models for high-frequency currency market data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Empirical modeling of high-frequency currency market data reveals substantial evidence for nonnormality, stochastic volatility, and other nonlinearities. This paper investigates whether an equilibrium monetary model can account for nonlinearities in weekly data. The model incorporates time-nonseparable preferences and a transaction cost technology. Simulated sample paths are generated using Marcet's parameterized expectations procedure. The paper also develops a new method for estimation of structural economic models. The method forces the model to match (under a GMM criterion) the score function of a nonparametric estimate of the conditional density of observed data. The estimation uses weekly U.S.-German currency market data, 1975-90. © 1995.

Veja mais

3D refraction correction and extraction of clinical parameters from spectral domain optical coherence tomography of the cornea.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Capable of three-dimensional imaging of the cornea with micrometer-scale resolution, spectral domain-optical coherence tomography (SDOCT) offers potential advantages over Placido ring and Scheimpflug photography based systems for accurate extraction of quantitative keratometric parameters. In this work, an SDOCT scanning protocol and motion correction algorithm were implemented to minimize the effects of patient motion during data acquisition. Procedures are described for correction of image data artifacts resulting from 3D refraction of SDOCT light in the cornea and from non-idealities of the scanning system geometry performed as a pre-requisite for accurate parameter extraction. Zernike polynomial 3D reconstruction and a recursive half searching algorithm (RHSA) were implemented to extract clinical keratometric parameters including anterior and posterior radii of curvature, central cornea optical power, central corneal thickness, and thickness maps of the cornea. Accuracy and repeatability of the extracted parameters obtained using a commercial 859nm SDOCT retinal imaging system with a corneal adapter were assessed using a rigid gas permeable (RGP) contact lens as a phantom target. Extraction of these parameters was performed in vivo in 3 patients and compared to commercial Placido topography and Scheimpflug photography systems. The repeatability of SDOCT central corneal power measured in vivo was 0.18 Diopters, and the difference observed between the systems averaged 0.1 Diopters between SDOCT and Scheimpflug photography, and 0.6 Diopters between SDOCT and Placido topography.

Veja mais

Assessing variability and long-term trends in burned area by merging multiple satellite fire products

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Long term, high quality estimates of burned area are needed for improving both prognostic and diagnostic fire emissions models and for assessing feedbacks between fire and the climate system. We developed global, monthly burned area estimates aggregated to 0.5° spatial resolution for the time period July 1996 through mid-2009 using four satellite data sets. From 2001ĝ€ "2009, our primary data source was 500-m burned area maps produced using Moderate Resolution Imaging Spectroradiometer (MODIS) surface reflectance imagery; more than 90% of the global area burned during this time period was mapped in this fashion. During times when the 500-m MODIS data were not available, we used a combination of local regression and regional regression trees developed over periods when burned area and Terra MODIS active fire data were available to indirectly estimate burned area. Cross-calibration with fire observations from the Tropical Rainfall Measuring Mission (TRMM) Visible and Infrared Scanner (VIRS) and the Along-Track Scanning Radiometer (ATSR) allowed the data set to be extended prior to the MODIS era. With our data set we estimated that the global annual area burned for the years 1997ĝ€ "2008 varied between 330 and 431 Mha, with the maximum occurring in 1998. We compared our data set to the recent GFED2, L3JRC, GLOBCARBON, and MODIS MCD45A1 global burned area products and found substantial differences in many regions. Lastly, we assessed the interannual variability and long-term trends in global burned area over the past 13 years. This burned area time series serves as the basis for the third version of the Global Fire Emissions Database (GFED3) estimates of trace gas and aerosol emissions.

Veja mais

Missing data in randomized clinical trials for weight loss: scope of the problem, state of the field, and performance of statistical methods.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

BACKGROUND: Dropouts and missing data are nearly-ubiquitous in obesity randomized controlled trails, threatening validity and generalizability of conclusions. Herein, we meta-analytically evaluate the extent of missing data, the frequency with which various analytic methods are employed to accommodate dropouts, and the performance of multiple statistical methods. METHODOLOGY/PRINCIPAL FINDINGS: We searched PubMed and Cochrane databases (2000-2006) for articles published in English and manually searched bibliographic references. Articles of pharmaceutical randomized controlled trials with weight loss or weight gain prevention as major endpoints were included. Two authors independently reviewed each publication for inclusion. 121 articles met the inclusion criteria. Two authors independently extracted treatment, sample size, drop-out rates, study duration, and statistical method used to handle missing data from all articles and resolved disagreements by consensus. In the meta-analysis, drop-out rates were substantial with the survival (non-dropout) rates being approximated by an exponential decay curve (e(-lambdat)) where lambda was estimated to be .0088 (95% bootstrap confidence interval: .0076 to .0100) and t represents time in weeks. The estimated drop-out rate at 1 year was 37%. Most studies used last observation carried forward as the primary analytic method to handle missing data. We also obtained 12 raw obesity randomized controlled trial datasets for empirical analyses. Analyses of raw randomized controlled trial data suggested that both mixed models and multiple imputation performed well, but that multiple imputation may be more robust when missing data are extensive. CONCLUSION/SIGNIFICANCE: Our analysis offers an equation for predictions of dropout rates useful for future study planning. Our raw data analyses suggests that multiple imputation is better than other methods for handling missing data in obesity randomized controlled trials, followed closely by mixed models. We suggest these methods supplant last observation carried forward as the primary method of analysis.

Veja mais

Towards a field-compatible optical spectroscopic device for cervical cancer screening in resource-limited settings: effects of calibration and pressure.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Quantitative optical spectroscopy has the potential to provide an effective low cost, and portable solution for cervical pre-cancer screening in resource-limited communities. However, clinical studies to validate the use of this technology in resource-limited settings require low power consumption and good quality control that is minimally influenced by the operator or variable environmental conditions in the field. The goal of this study was to evaluate the effects of two sources of potential error: calibration and pressure on the extraction of absorption and scattering properties of normal cervical tissues in a resource-limited setting in Leogane, Haiti. Our results show that self-calibrated measurements improved scattering measurements through real-time correction of system drift, in addition to minimizing the time required for post-calibration. Variations in pressure (tested without the potential confounding effects of calibration error) caused local changes in vasculature and scatterer density that significantly impacted the tissue absorption and scattering properties Future spectroscopic systems intended for clinical use, particularly where operator training is not viable and environmental conditions unpredictable, should incorporate a real-time self-calibration channel and collect diffuse reflectance spectra at a consistent pressure to maximize data integrity.

Veja mais

Calibrating single-ended fiber-optic Raman spectra distributed temperature sensing data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Hydrologic research is a very demanding application of fiber-optic distributed temperature sensing (DTS) in terms of precision, accuracy and calibration. The physics behind the most frequently used DTS instruments are considered as they apply to four calibration methods for single-ended DTS installations. The new methods presented are more accurate than the instrument-calibrated data, achieving accuracies on the order of tenths of a degree root mean square error (RMSE) and mean bias. Effects of localized non-uniformities that violate the assumptions of single-ended calibration data are explored and quantified. Experimental design considerations such as selection of integration times or selection of the length of the reference sections are discussed, and the impacts of these considerations on calibrated temperatures are explored in two case studies.

Veja mais

Importance of prostate volume in the European Randomised Study of Screening for Prostate Cancer (ERSPC) risk calculators: results from the prostate biopsy collaborative group.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

OBJECTIVES: To compare the predictive performance and potential clinical usefulness of risk calculators of the European Randomized Study of Screening for Prostate Cancer (ERSPC RC) with and without information on prostate volume. METHODS: We studied 6 cohorts (5 European and 1 US) with a total of 15,300 men, all biopsied and with pre-biopsy TRUS measurements of prostate volume. Volume was categorized into 3 categories (25, 40, and 60 cc), to reflect use of digital rectal examination (DRE) for volume assessment. Risks of prostate cancer were calculated according to a ERSPC DRE-based RC (including PSA, DRE, prior biopsy, and prostate volume) and a PSA + DRE model (including PSA, DRE, and prior biopsy). Missing data on prostate volume were completed by single imputation. Risk predictions were evaluated with respect to calibration (graphically), discrimination (AUC curve), and clinical usefulness (net benefit, graphically assessed in decision curves). RESULTS: The AUCs of the ERSPC DRE-based RC ranged from 0.61 to 0.77 and were substantially larger than the AUCs of a model based on only PSA + DRE (ranging from 0.56 to 0.72) in each of the 6 cohorts. The ERSPC DRE-based RC provided net benefit over performing a prostate biopsy on the basis of PSA and DRE outcome in five of the six cohorts. CONCLUSIONS: Identifying men at increased risk for having a biopsy detectable prostate cancer should consider multiple factors, including an estimate of prostate volume.

Veja mais

A Diffusion MRI Tractography Connectome of the Mouse Brain and Comparison with Neuronal Tracer Data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Interest in structural brain connectivity has grown with the understanding that abnormal neural connections may play a role in neurologic and psychiatric diseases. Small animal connectivity mapping techniques are particularly important for identifying aberrant connectivity in disease models. Diffusion magnetic resonance imaging tractography can provide nondestructive, 3D, brain-wide connectivity maps, but has historically been limited by low spatial resolution, low signal-to-noise ratio, and the difficulty in estimating multiple fiber orientations within a single image voxel. Small animal diffusion tractography can be substantially improved through the combination of ex vivo MRI with exogenous contrast agents, advanced diffusion acquisition and reconstruction techniques, and probabilistic fiber tracking. Here, we present a comprehensive, probabilistic tractography connectome of the mouse brain at microscopic resolution, and a comparison of these data with a neuronal tracer-based connectivity data from the Allen Brain Atlas. This work serves as a reference database for future tractography studies in the mouse brain, and demonstrates the fundamental differences between tractography and neuronal tracer data.

Veja mais

Spectrotemporal CT data acquisition and reconstruction at low dose.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

PURPOSE: X-ray computed tomography (CT) is widely used, both clinically and preclinically, for fast, high-resolution anatomic imaging; however, compelling opportunities exist to expand its use in functional imaging applications. For instance, spectral information combined with nanoparticle contrast agents enables quantification of tissue perfusion levels, while temporal information details cardiac and respiratory dynamics. The authors propose and demonstrate a projection acquisition and reconstruction strategy for 5D CT (3D+dual energy+time) which recovers spectral and temporal information without substantially increasing radiation dose or sampling time relative to anatomic imaging protocols. METHODS: The authors approach the 5D reconstruction problem within the framework of low-rank and sparse matrix decomposition. Unlike previous work on rank-sparsity constrained CT reconstruction, the authors establish an explicit rank-sparse signal model to describe the spectral and temporal dimensions. The spectral dimension is represented as a well-sampled time and energy averaged image plus regularly undersampled principal components describing the spectral contrast. The temporal dimension is represented as the same time and energy averaged reconstruction plus contiguous, spatially sparse, and irregularly sampled temporal contrast images. Using a nonlinear, image domain filtration approach, the authors refer to as rank-sparse kernel regression, the authors transfer image structure from the well-sampled time and energy averaged reconstruction to the spectral and temporal contrast images. This regularization strategy strictly constrains the reconstruction problem while approximately separating the temporal and spectral dimensions. Separability results in a highly compressed representation for the 5D data in which projections are shared between the temporal and spectral reconstruction subproblems, enabling substantial undersampling. The authors solved the 5D reconstruction problem using the split Bregman method and GPU-based implementations of backprojection, reprojection, and kernel regression. Using a preclinical mouse model, the authors apply the proposed algorithm to study myocardial injury following radiation treatment of breast cancer. RESULTS: Quantitative 5D simulations are performed using the MOBY mouse phantom. Twenty data sets (ten cardiac phases, two energies) are reconstructed with 88 μm, isotropic voxels from 450 total projections acquired over a single 360° rotation. In vivo 5D myocardial injury data sets acquired in two mice injected with gold and iodine nanoparticles are also reconstructed with 20 data sets per mouse using the same acquisition parameters (dose: ∼60 mGy). For both the simulations and the in vivo data, the reconstruction quality is sufficient to perform material decomposition into gold and iodine maps to localize the extent of myocardial injury (gold accumulation) and to measure cardiac functional metrics (vascular iodine). Their 5D CT imaging protocol represents a 95% reduction in radiation dose per cardiac phase and energy and a 40-fold decrease in projection sampling time relative to their standard imaging protocol. CONCLUSIONS: Their 5D CT data acquisition and reconstruction protocol efficiently exploits the rank-sparse nature of spectral and temporal CT data to provide high-fidelity reconstruction results without increased radiation dose or sampling time.

Veja mais

Monte Carlo Analysis and Physics Characterization of a Novel Nanoparticle Detector for Medical and Micro-dosimetry Applications

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The outcomes for both (i) radiation therapy and (ii) preclinical small animal radio- biology studies are dependent on the delivery of a known quantity of radiation to a speciﬁc and intentional location. Adverse eﬀects can result from these procedures if the dose to the target is too high or low, and can also result from an incorrect spatial distribution in which nearby normal healthy tissue can be undesirably damaged by poor radiation delivery techniques. Thus, in mice and humans alike, the spatial dose distributions from radiation sources should be well characterized in terms of the absolute dose quantity, and with pin-point accuracy. When dealing with the steep spatial dose gradients consequential to either (i) high dose rate (HDR) brachytherapy or (ii) within the small organs and tissue inhomogeneities of mice, obtaining accurate and highly precise dose results can be very challenging, considering commercially available radiation detection tools, such as ion chambers, are often too large for in-vivo use.

In this dissertation two tools are developed and applied for both clinical and preclinical radiation measurement. The ﬁrst tool is a novel radiation detector for acquiring physical measurements, fabricated from an inorganic nano-crystalline scintillator that has been ﬁxed on an optical ﬁber terminus. This dosimeter allows for the measurement of point doses to sub-millimeter resolution, and has the ability to be placed in-vivo in humans and small animals. Real-time data is displayed to the user to provide instant quality assurance and dose-rate information. The second tool utilizes an open source Monte Carlo particle transport code, and was applied for small animal dosimetry studies to calculate organ doses and recommend new techniques of dose prescription in mice, as well as to characterize dose to the murine bone marrow compartment with micron-scale resolution.

Hardware design changes were implemented to reduce the overall ﬁber diameter to <0.9 mm for the nano-crystalline scintillator based ﬁber optic detector (NanoFOD) system. Lower limits of device sensitivity were found to be approximately 0.05 cGy/s. Herein, this detector was demonstrated to perform quality assurance of clinical 192Ir HDR brachytherapy procedures, providing comparable dose measurements as thermo-luminescent dosimeters and accuracy within 20% of the treatment planning software (TPS) for 27 treatments conducted, with an inter-quartile range ratio to the TPS dose value of (1.02-0.94=0.08). After removing contaminant signals (Cerenkov and diode background), calibration of the detector enabled accurate dose measurements for vaginal applicator brachytherapy procedures. For 192Ir use, energy response changed by a factor of 2.25 over the SDD values of 3 to 9 cm; however a cap made of 0.2 mm thickness silver reduced energy dependence to a factor of 1.25 over the same SDD range, but had the consequence of reducing overall sensitivity by 33%.

For preclinical measurements, dose accuracy of the NanoFOD was within 1.3% of MOSFET measured dose values in a cylindrical mouse phantom at 225 kV for x-ray irradiation at angles of 0, 90, 180, and 270˝. The NanoFOD exhibited small changes in angular sensitivity, with a coeﬃcient of variation (COV) of 3.6% at 120 kV and 1% at 225 kV. When the NanoFOD was placed alongside a MOSFET in the liver of a sacriﬁced mouse and treatment was delivered at 225 kV with 0.3 mm Cu ﬁlter, the dose diﬀerence was only 1.09% with use of the 4x4 cm collimator, and -0.03% with no collimation. Additionally, the NanoFOD utilized a scintillator of 11 µm thickness to measure small x-ray ﬁelds for microbeam radiation therapy (MRT) applications, and achieved 2.7% dose accuracy of the microbeam peak in comparison to radiochromic ﬁlm. Modest diﬀerences between the full-width at half maximum measured lateral dimension of the MRT system were observed between the NanoFOD (420 µm) and radiochromic ﬁlm (320 µm), but these diﬀerences have been explained mostly as an artifact due to the geometry used and volumetric eﬀects in the scintillator material. Characterization of the energy dependence for the yttrium-oxide based scintillator material was performed in the range of 40-320 kV (2 mm Al ﬁltration), and the maximum device sensitivity was achieved at 100 kV. Tissue maximum ratio data measurements were carried out on a small animal x-ray irradiator system at 320 kV and demonstrated an average diﬀerence of 0.9% as compared to a MOSFET dosimeter in the range of 2.5 to 33 cm depth in tissue equivalent plastic blocks. Irradiation of the NanoFOD ﬁber and scintillator material on a 137Cs gamma irradiator to 1600 Gy did not produce any measurable change in light output, suggesting that the NanoFOD system may be re-used without the need for replacement or recalibration over its lifetime.

For small animal irradiator systems, researchers can deliver a given dose to a target organ by controlling exposure time. Currently, researchers calculate this exposure time by dividing the total dose that they wish to deliver by a single provided dose rate value. This method is independent of the target organ. Studies conducted here used Monte Carlo particle transport codes to justify a new method of dose prescription in mice, that considers organ speciﬁc doses. Monte Carlo simulations were performed in the Geant4 Application for Tomographic Emission (GATE) toolkit using a MOBY mouse whole-body phantom. The non-homogeneous phantom was comprised of 256x256x800 voxels of size 0.145x0.145x0.145 mm3. Diﬀerences of up to 20-30% in dose to soft-tissue target organs was demonstrated, and methods for alleviating these errors were suggested during whole body radiation of mice by utilizing organ speciﬁc and x-ray tube ﬁlter speciﬁc dose rates for all irradiations.

Monte Carlo analysis was used on 1 µm resolution CT images of a mouse femur and a mouse vertebra to calculate the dose gradients within the bone marrow (BM) compartment of mice based on diﬀerent radiation beam qualities relevant to x-ray and isotope type irradiators. Results and ﬁndings indicated that soft x-ray beams (160 kV at 0.62 mm Cu HVL and 320 kV at 1 mm Cu HVL) lead to substantially higher dose to BM within close proximity to mineral bone (within about 60 µm) as compared to hard x-ray beams (320 kV at 4 mm Cu HVL) and isotope based gamma irradiators (137Cs). The average dose increases to the BM in the vertebra for these four aforementioned radiation beam qualities were found to be 31%, 17%, 8%, and 1%, respectively. Both in-vitro and in-vivo experimental studies conﬁrmed these simulation results, demonstrating that the 320 kV, 1 mm Cu HVL beam caused statistically signiﬁcant increased killing to the BM cells at 6 Gy dose levels in comparison to both the 320 kV, 4 mm Cu HVL and the 662 keV, 137Cs beams.

Veja mais

Computational Inference of Genome-Wide Protein-DNA Interactions Using High-Throughput Genomic Data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Transcriptional regulation has been studied intensively in recent decades. One important aspect of this regulation is the interaction between regulatory proteins, such as transcription factors (TF) and nucleosomes, and the genome. Different high-throughput techniques have been invented to map these interactions genome-wide, including ChIP-based methods (ChIP-chip, ChIP-seq, etc.), nuclease digestion methods (DNase-seq, MNase-seq, etc.), and others. However, a single experimental technique often only provides partial and noisy information about the whole picture of protein-DNA interactions. Therefore, the overarching goal of this dissertation is to provide computational developments for jointly modeling different experimental datasets to achieve a holistic inference on the protein-DNA interaction landscape.

We first present a computational framework that can incorporate the protein binding information in MNase-seq data into a thermodynamic model of protein-DNA interaction. We use a correlation-based objective function to model the MNase-seq data and a Markov chain Monte Carlo method to maximize the function. Our results show that the inferred protein-DNA interaction landscape is concordant with the MNase-seq data and provides a mechanistic explanation for the experimentally collected MNase-seq fragments. Our framework is flexible and can easily incorporate other data sources. To demonstrate this flexibility, we use prior distributions to integrate experimentally measured protein concentrations.

We also study the ability of DNase-seq data to position nucleosomes. Traditionally, DNase-seq has only been widely used to identify DNase hypersensitive sites, which tend to be open chromatin regulatory regions devoid of nucleosomes. We reveal for the first time that DNase-seq datasets also contain substantial information about nucleosome translational positioning, and that existing DNase-seq data can be used to infer nucleosome positions with high accuracy. We develop a Bayes-factor-based nucleosome scoring method to position nucleosomes using DNase-seq data. Our approach utilizes several effective strategies to extract nucleosome positioning signals from the noisy DNase-seq data, including jointly modeling data points across the nucleosome body and explicitly modeling the quadratic and oscillatory DNase I digestion pattern on nucleosomes. We show that our DNase-seq-based nucleosome map is highly consistent with previous high-resolution maps. We also show that the oscillatory DNase I digestion pattern is useful in revealing the nucleosome rotational context around TF binding sites.

Finally, we present a state-space model (SSM) for jointly modeling different kinds of genomic data to provide an accurate view of the protein-DNA interaction landscape. We also provide an efficient expectation-maximization algorithm to learn model parameters from data. We first show in simulation studies that the SSM can effectively recover underlying true protein binding configurations. We then apply the SSM to model real genomic data (both DNase-seq and MNase-seq data). Through incrementally increasing the types of genomic data in the SSM, we show that different data types can contribute complementary information for the inference of protein binding landscape and that the most accurate inference comes from modeling all available datasets.

This dissertation provides a foundation for future research by taking a step toward the genome-wide inference of protein-DNA interaction landscape through data integration.

Veja mais

14 resultados para Chemometrics, Data pretreatment, variate calibration, variate curve resolution

em Duke University

Filtro por publicador